Skip to content

geotiff: route read_vrt(chunks=) XML read through size cap (#1831)#1832

Merged
brendancol merged 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-13-s4-fix01
May 14, 2026
Merged

geotiff: route read_vrt(chunks=) XML read through size cap (#1831)#1832
brendancol merged 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-13-s4-fix01

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

PR #1818 capped VRT XML reads at 64 MiB (override via XRSPATIAL_VRT_MAX_XML_BYTES) in _vrt._read_vrt_xml so an attacker-supplied multi-GB VRT file cannot exhaust host memory at parse time. The lazy chunked dispatcher landed in #1822 (_read_vrt_chunked) parses the VRT independently with a bare open(source, 'r').read(), bypassing that cap; only the per-task worker reuses the capped reader.

This routes the chunked-path XML read through _read_vrt_xml so the same cap and env-var override apply on both eager and chunked entry points. Closes #1831.

Repro (pre-fix)

import os, tempfile, numpy as np
from xrspatial.geotiff import to_geotiff, read_vrt

td = tempfile.mkdtemp()
to_geotiff(np.zeros((10, 10), np.uint8),
           os.path.join(td, 'src.tif'), compression='none')

vrt = os.path.join(td, 'big.vrt')
pad = '<!-- ' + ('x' * (2 * 1024 * 1024)) + ' -->'
open(vrt, 'w').write(
    '<VRTDataset rasterXSize="10" rasterYSize="10">' + pad +
    '<VRTRasterBand dataType="Byte" band="1"><SimpleSource>'
    '<SourceFilename relativeToVRT="1">src.tif</SourceFilename>'
    '<SourceBand>1</SourceBand>'
    '<SrcRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
    '<DstRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
    '</SimpleSource></VRTRasterBand></VRTDataset>')

os.environ['XRSPATIAL_VRT_MAX_XML_BYTES'] = '1024'
read_vrt(vrt)              # eager: ValueError (cap enforced)
read_vrt(vrt, chunks=10)   # chunked: succeeds, cap bypassed

After this PR both raise the same ValueError referencing XRSPATIAL_VRT_MAX_XML_BYTES.

Test plan

  • xrspatial/geotiff/tests/test_vrt_xml_size_cap_chunked_1831.py - new file, 3 tests pinning small-cap rejection, default-cap success, raised-cap success for chunks=.
  • Existing test_vrt_xml_size_cap_1815.py (eager path) still passes.
  • Existing test_vrt_lazy_chunks_1814.py (chunked happy path) still passes.

Found by automated security sweep (deep-sweep-security-geotiff-2026-05-13-s4), Cat 1 (unbounded allocation), severity MEDIUM.

PR #1818 capped VRT XML reads at 64 MiB (configurable via
XRSPATIAL_VRT_MAX_XML_BYTES) in xrspatial/geotiff/_vrt.py::_read_vrt_xml
to keep a multi-GB attacker-supplied VRT file from exhausting memory at
parse time.

The chunked dispatcher merged in #1822 (_read_vrt_chunked in
xrspatial/geotiff/__init__.py) parses the VRT independently with a
bare open().read() before calling parse_vrt, so the cap did not apply
to read_vrt(path, chunks=...). The per-task _vrt_chunk_read worker
calls _read_vrt_internal (which goes through _read_vrt_xml), so only
the parent dispatch path was uncapped.

Route the chunked-path XML read through _read_vrt_xml so the same cap
and env-var override apply on both eager and chunked entry points.

Regression test pins the small-cap rejection, the default-cap success,
and the raised-cap success for chunks=.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 14, 2026
@brendancol brendancol requested a review from Copilot May 14, 2026 04:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes a security regression where read_vrt(..., chunks=...) bypassed the VRT XML size cap introduced in #1818 by performing an unbounded open().read(). It routes the chunked dispatcher’s upfront XML load through _vrt._read_vrt_xml so both eager and chunked entry points consistently enforce XRSPATIAL_VRT_MAX_XML_BYTES.

Changes:

  • Update _read_vrt_chunked to read XML via _read_vrt_xml instead of open().read(), applying the same 64 MiB default cap and env-var override.
  • Add regression tests covering cap rejection, default-cap success, and raised-cap success for the chunked (chunks=) path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/__init__.py Routes chunked-path VRT XML reading through _read_vrt_xml to enforce the size cap consistently.
xrspatial/geotiff/tests/test_vrt_xml_size_cap_chunked_1831.py Adds regression tests ensuring chunks= honors XRSPATIAL_VRT_MAX_XML_BYTES.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@brendancol brendancol merged commit a23abf4 into main May 14, 2026
16 checks passed
@brendancol brendancol deleted the deep-sweep-security-geotiff-2026-05-13-s4-fix01 branch May 15, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: read_vrt(chunks=) bypasses VRT XML size cap from #1818

2 participants